Week 9.5 - Verification Protocols for a Moving Target

🎯 What We'll Cover

The previous sub-lessons established the trajectory (9.1), the failure taxonomy (9.2), the genuine capability cases (9.3), and the durable epistemic concerns (9.4). This sub-lesson is the practical pay-off: how do you actually verify AI output in your own research workflow, and how do you read claims about AI capability critically when the literature is always six to twelve months behind the artefact?

Two layers. First, verification of specific AI outputs — techniques you apply to a particular analysis, citation, or proof. Second, verification of capability claims — the meta-skill of reading the AI literature without being fooled by dated examples or selection bias. Both are required.

🔎 Layer 1: Verifying Specific AI Outputs

These techniques are the operational consequence of the failure taxonomy. Each targets a specific failure mode from 9.2.

Known-Answer Testing

Run AI through tasks where you know the correct answer in advance — synthetic data with planted truths, problems from your field where the answer is published, edge cases you've constructed. Check whether the output matches. The Week 7 silent-error problem — code that runs but is wrong — is detected here, nowhere else. Build small known-answer sets in your own domain and re-run them whenever you change models.

Adversarial Prompting

Deliberately try to break outputs. Ask trick questions, push back on initial answers, change framings to see whether responses are stable. The Hallucination Hunting activity in 9.6 is exactly this. The point is not to catch the AI being wrong once — it is to map where the AI is fragile in your domain.

Cross-Model Triangulation

Ask the same question of Claude Opus 4.7, GPT-5.5, Gemini 3.1 Pro, and an open-weights model like DeepSeek V4 Pro. When they agree: that's some evidence the answer is correct (provided their training data overlaps don't produce a shared error). When they disagree: that's a stronger signal that the question is one you should investigate yourself rather than trust any single output.

The “Teach It Back” Test

If you cannot explain the AI's reasoning in your own words, you don't understand it — and you should not trust it. This applies whether the AI is doing your statistics, your code review, or your conceptual synthesis. The Messeri & Crockett illusions (9.4) are specifically about the gap between feeling you understand and actually understanding. The teach-it-back test closes the gap.

Manual Spot-Checks

Even when overall reliability is high, manually verify a sample of outputs. Every Nth citation. Every Nth analytical step. Random samples of generated text. The discipline is not catching every error; it is keeping your own intuitions calibrated as the work scales.

Citation Verification

Direct re-use of the Week 5 framework. The Five-Point Citation Check (existence, authors, year, venue, claim verification) applies to every AI-generated citation. Frontier models still hallucinate citations on niche topics. The fix is mechanical, not mysterious; treat citation verification as a pipeline step, not an afterthought.

Reproducibility Testing

Does the same prompt produce the same answer on different runs? Different sessions? Different models in the same family? Variable outputs are a signal — either of model uncertainty (informative) or of stochasticity that may matter for your downstream conclusions. Test before relying on a single run.

Domain-Expert Spot-Checks

For cross-disciplinary work, recruit a domain expert to check AI output in their field. The Dunning–Kruger trap from 9.4 means you cannot reliably assess AI output in fields you don't know well. The fix is not to try harder; it is to ask someone who can.

📊 Layer 2: Verifying Capability Claims (the Dated-Research Trap)

This is the meta-skill the rest of the week has been building towards. When you encounter a published claim about what AI can or cannot do — in a journal article, a conference paper, a news story, a textbook — treat it as a hypothesis about a system that has since been replaced.

📝 The Three-Question Check

1. Which model and version? A 2023 paper that says “LLMs cannot do X” tested GPT-3.5, GPT-4, or Claude 1. None of those is current. Without knowing which specific system was tested, the claim has no clear referent.

2. When was it tested? Anything pre-2024 is now historical. The artefact under study has been replaced two or three times since. Useful as a historical landmark; not useful as a current diagnosis.

3. Has anyone retested with current frontier models? If yes, what did the retest find? If no, the claim is unproven on present systems — not falsified, just unproven. Treat with the same caution you would treat a 2010 claim about smartphone battery life applied to 2026 phones.

A specific case to keep in mind: the Frieder et al. (2023) paper on “Mathematical Capabilities of ChatGPT” found that GPT-4 was inadequate for graduate-level mathematics. This was a careful, well-cited study at the time. By May 2026, the same questions answered by GPT-5.5 Pro, Gemini Deep Think, or AlphaEvolve give very different answers (see 9.3). Citing Frieder et al. as if it described current AI is now an error — but the citation persists in the literature precisely because it was a good paper when published.

📝 A Concrete Example: The Reversal Curse

Berglund et al. (2023) showed the reversal curse on the GPT-3.5/GPT-4 generation. The paper has been cited several thousand times. It was an important contribution at the time.

By 2026, the reversal curse is no longer a prominent failure in benchmark discussions. Frontier models handle it largely correctly. A 2026 paper that opens with “LLMs famously suffer from the reversal curse (Berglund et al., 2023)” has not retested the claim and is treating a historical finding as a current diagnosis.

The fix: cite Berglund et al. as the originating documentation of a failure mode that has since been mitigated, not as evidence about current model behaviour. The same disciplinary move applies to most pre-2025 capability claims.

🤸 Building Verification Habits

Verification is not a one-time audit. It is a habit that survives model releases. Three commitments that make verification durable:

Verification as workflow component, not afterthought. Every AI-assisted analysis pipeline should have a verification step before any output leaves your workspace. Bake it into the workflow; don't leave it to memory or virtue.
Retest cadence. When a new frontier model drops, your previous calibration may be wrong — in either direction. Re-run your known-answer test set. Re-test the failure cases that worried you. Re-test the capability cases that worked. The model that's frontier today is not the one you calibrated against six months ago.
Write down what you found. Document the model, the date, the prompts, the outputs, the verification steps. This is partly for reproducibility, partly for your own future self when you wonder “was that working back then or am I misremembering?”, and partly so the field can learn from your domain-specific findings.

📝 The Course's Own Verification Practice

This course practises what it teaches. The materials are AI-assisted (see the AI Content Disclaimer on the home page) and have been verified using a custom /verify-references skill that does exactly the kind of checking described above: every URL fetched, every named citation cross-checked against primary sources, every statistical claim traced back to source.

In building these materials we have caught and fixed errors of exactly the patterns documented this week:

Wrong arXiv IDs (a Gupta “Chasing Carbon” reference linked to a maths paper on Fourier–Stieltjes algebras)
Hallucinated authors (“Tamburrino et al.” for what was actually Wright et al.; “Anthony et al.” for what was actually Luccioni et al.; “Humbel et al.” for what was actually Crosilla et al.)
Wrong DOIs (a Lund et al. citation pointed at a paper about academic promotions, not ChatGPT)
Hallucinated post titles (a Mollick blog post on AI ethics that doesn't exist)
Wrong article titles (Nature article cited with a hallucinated subtitle; MIT Tech Review article cited with the URL slug as “title”)
Wrong link slugs (Okolo Brookings article URL had “equitable” where it should have been “inclusive”; ICRC autonomous-weapons URL had the words in the wrong order)

All caught by the verification process. Some had been in place across multiple weeks. The lesson: even careful AI-assisted writing produces these errors at non-trivial rates. Verification catches them; without verification, they persist.

🧠 A 2026 Open Question: “Teaching Claude Why”

In May 2026 Anthropic published research on training models not just to follow safety rules but to understand the reasons behind those rules. The thinking: a model that understands why a rule exists can navigate novel situations the rule-makers didn't anticipate, and can articulate trade-offs when rules conflict.

This raises a verification question that doesn't have a settled answer yet. If a model is being trained to understand why it should behave certain ways, does that change how we verify its outputs? Does understanding motivation make verification easier (the model can explain its reasoning) or harder (the model can construct plausible justifications for almost anything)? It is too early to say definitively. Worth tracking as the research develops — and worth thinking about for your own discipline: when AI systems can articulate why they reached a conclusion, what new verification opportunities and pitfalls emerge?

The verification stance

Verification is not a sign of distrust in AI. It is a sign of taking AI seriously as a research tool. Tools you take seriously, you check. Microscopes get calibrated. Statistical packages get validated against worked examples. AI outputs get verified against primary sources.

The researcher who never verifies their AI outputs is making a methodological mistake. The researcher who verifies systematically is doing their job.

👉 What Comes Next

Sub-Lesson 9.6 — Hands-On Activities and Assessment. Three activities that put the trajectory and verification frames into practice in your own field, and a final assessment that asks you to write your own dated capability/limitation snapshot. The deliverable explicitly acknowledges that it will go stale — that is the teaching move.